Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move storage integrations to be configured as services (PP-95) #1377

Merged
merged 7 commits into from
Sep 22, 2023

Conversation

jonathangreen
Copy link
Member

@jonathangreen jonathangreen commented Sep 13, 2023

Description

  • Add a dependency injection container, that contains services used by our controllers.
  • Create framework for adding new services, and configuring them via environment variables, validated via pydantic.
  • Remove the existing mirror integration code, and replace it with a new storage service.
    • Modify the marc exporter and s3 analytics provider to use the new storage service.
    • Add the new environment variables for the storage service to README.

Notes 🚨: (these are now all resolved)

Motivation and Context

  • JIRA: PP-95
  • The motivation here is to allow services that don't need to be configured in the admin UI to be configured via environment variables, and instantiated via a dependency injection container, so we are able to easily mock the services for testing.
  • This uses the storage service as a test case for the new services architecture, and implements it as a s3 storage service.

How Has This Been Tested?

Notes: The docker-compose.yml included in the repo was updated with the new environment variables, so that environment can be used for integration testing.

  • Manually tested in a docker-compose environment.
    • Imported the Palace Bookshelf collection.
    • Configured s3 analytics and marc integrations.
    • Checked out a book.
      • Made sure analytics json files ended up in minio bucket.
    • Ran the cache_marc_files script.
      • Made sure that marc files ended up in the public access bucket.
      • Made sure links to the files worked at the /marc endpoint.
  • Ran unit tests locally and in CI.

Checklist

  • I have updated the documentation accordingly.
  • All new and existing tests passed.

@jonathangreen jonathangreen changed the base branch from main to feature/remove-content-mirroring September 13, 2023 00:46
@jonathangreen jonathangreen changed the title Move storage integrations to be configured as services Move storage integrations to be configured as services (PP-95) Sep 13, 2023
@codecov
Copy link

codecov bot commented Sep 13, 2023

Codecov Report

Patch coverage: 93.27% and project coverage change: +0.05% 🎉

Comparison is base (ac78b47) 90.17% compared to head (42586d2) 90.22%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1377      +/-   ##
==========================================
+ Coverage   90.17%   90.22%   +0.05%     
==========================================
  Files         226      229       +3     
  Lines       29871    29905      +34     
  Branches     6845     6831      -14     
==========================================
+ Hits        26936    26983      +47     
+ Misses       1893     1880      -13     
  Partials     1042     1042              
Files Changed Coverage Δ
api/admin/controller/__init__.py 100.00% <ø> (ø)
api/admin/routes.py 94.70% <ø> (-0.13%) ⬇️
core/model/collection.py 95.14% <0.00%> (-0.86%) ⬇️
core/model/configuration.py 90.94% <ø> (-0.68%) ⬇️
core/model/resource.py 86.25% <ø> (-0.61%) ⬇️
api/s3_analytics_provider.py 68.11% <68.00%> (-7.21%) ⬇️
core/service/storage/s3.py 93.96% <93.96%> (ø)
api/admin/config.py 96.05% <100.00%> (ø)
api/admin/controller/analytics_services.py 96.10% <100.00%> (+3.33%) ⬆️
api/admin/controller/catalog_services.py 89.33% <100.00%> (-0.03%) ⬇️
... and 11 more

... and 3 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jonathangreen jonathangreen marked this pull request as ready for review September 13, 2023 15:08
@jonathangreen jonathangreen requested a review from a team September 13, 2023 15:09
@jonathangreen jonathangreen added UI Update This PR requires an admin UI update cleanup migration PR that will need a cleanup migration once its been fully deployed incompatible changes Changes that require a new major version labels Sep 13, 2023
@jonathangreen jonathangreen force-pushed the feature/remove-content-mirroring branch from a13d3b1 to fe97e88 Compare September 15, 2023 12:03
Base automatically changed from feature/remove-content-mirroring to main September 15, 2023 12:40
@jonathangreen jonathangreen force-pushed the feature/storage-integration branch 2 times, most recently from d739045 to 3d690c5 Compare September 19, 2023 12:44
@jonathangreen
Copy link
Member Author

This should be ready to merge now, once it gets a code review. All the dependant PRs have gone in, and its been rebased against the current main branch.

@tdilauro
Copy link
Contributor

I started looking at this yesterday and will continue today.

@tdilauro tdilauro self-requested a review September 20, 2023 13:11
Copy link
Contributor

@tdilauro tdilauro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still looking at the tests, but the app generally looks good. I do have a concern about relying on the Dependency Injector project because it's not clear that it is being maintained (they haven't merged a PR in 9 months).

api/s3_analytics_provider.py Show resolved Hide resolved
bin/odl2_import_monitor Show resolved Hide resolved
core/marc.py Show resolved Hide resolved
core/marc.py Show resolved Hide resolved
core/mock_analytics_provider.py Show resolved Hide resolved
core/model/collection.py Show resolved Hide resolved
core/scripts.py Outdated
Comment on lines 127 to 129
if services is None:
services = container_instance()
self._services = services
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor (style) - I think this is more clear as a ternary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I agree here, but I updated this one anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why it's "minor" and "style". I def won't be offended if you disagree or otherwise decide not to change it. I'll try to use "style pref" in the future, to make that more clear.

I generally like the ternary in these cases because it does two things:

  1. Makes it more clear to me that the main thing happening is that self._services is gonna get something assigned, either a specified value or a default.
  2. Makes it impossible for what's happening in ll. 127-128 and l. 129 above to get separated in the code.

(2) here is also why I'm a fan of the walrus operator, though I never suggest that anyone change that one, since it seems more controversial.



class StorageConfiguration(ServiceConfiguration):
region: str = "us-east-1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we allow a default here? It might lead to some non-obvious misconfiguration.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. I removed the default here. This required a good number of code changes. I pushed a new commit that:

  • Updates documentation
  • Updates tests
  • Adds new tests to s3 class constructor to make sure the URL template is consistent with the now optional region parameter.

scripts.py Outdated Show resolved Hide resolved
@jonathangreen
Copy link
Member Author

I pushed a few commits resolving code review feedback.

I do have a concern about relying on the Dependency Injector project because it's not clear that it is being maintained (they haven't merged a PR in 9 months).

Its hard to find a dependency injection container for Python. In the past I've found that constructor injection with a dependency injection container to wire up the dependencies is a helpful structure for webapps, and my hope is that we can slowly decouple our services this way.

Despite not having any PRs merged in the last 9 months, the library does have 1 million downloads a month on Pypy making it by far the most popular dependency injection container I've found for Python. It also has well written documentation https://python-dependency-injector.ets-labs.org/ and has been around since 2015.

There is a issue here ets-labs/python-dependency-injector#688 where the maintainer mentions that he considers the library feature complete at this point, but intends to support it to update python versions, dependency versions etc.

If you have a suggestion of a different library, I'm willing to take a look at switching this code. Otherwise the library becoming unsupported is a risk, but IMO the immediate gain of decoupling our services is worth it.

@tdilauro
Copy link
Contributor

Otherwise the library becoming unsupported is a risk, but IMO the immediate gain of decoupling our services is worth it.

I'm fine with that (and agree about the docs). Just wanted to ensure that you were aware of the support situation. Along with the time since last commit, it was that same issue 688 along with issue 742 in that project that gave me pause.

@jonathangreen jonathangreen force-pushed the feature/storage-integration branch from 9a49beb to 7371a07 Compare September 22, 2023 15:11
@jonathangreen
Copy link
Member Author

I rebased this with main to resolve the conflicts with #1268. The only changes in the rebase other then imports and poetry are in tests/core/test_marc.py around creating the fake search index for the marc tests, since that changed in #1268.

@jonathangreen jonathangreen force-pushed the feature/storage-integration branch from 7371a07 to 42586d2 Compare September 22, 2023 15:38
Copy link
Contributor

@tdilauro tdilauro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good! 🚀

One question about whether a test might need a network connection.


def test_region_validation_fail():
with pytest.raises(CannotLoadConfiguration) as exc_info:
StorageConfiguration(region="foo bar baz")
Copy link
Contributor

@tdilauro tdilauro Sep 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the StorageConfiguration's validator's region check need a network connection?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, but I'll verify.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't look like it. It seems like the available regions get loaded from the huge amount of json data that is included with botocore.

@jonathangreen jonathangreen merged commit 778ed79 into main Sep 22, 2023
@jonathangreen jonathangreen deleted the feature/storage-integration branch September 22, 2023 19:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cleanup migration PR that will need a cleanup migration once its been fully deployed incompatible changes Changes that require a new major version UI Update This PR requires an admin UI update
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants